home *** CD-ROM | disk | FTP | other *** search
-
- The Text Converter - TEC 1.0
-
- (c) 1994 Martin Mares, MJSoft System Software
-
- ================================================================================
-
-
- Copyright:
- ----------
-
- TEC and its documentation are Copyright (c) Martin Mares, MJSoft System
- Software, Prague, Czech Republic.
-
- This archive can be freely redistributed, as long as all of its files are
- included in their original form without any additions, deletions or
- modifications, and no more than a nominal fee is charged for its distribution.
- All copyright notices in the programs and accompanying documentation files
- must remain on their places. Also '.displayme' and other similar files may
- not be added. This is generally known as FREEWARE.
-
- Special permission is given to Fred Fish to distribute this program on his
- "Fish Disks".
-
- This software is provided "AS IS" without warranty of any kind, either
- expressed or implied. The author is not responsible for any damage caused by
- it.
-
-
- Introduction
- ------------
-
- Almost any programmer sometimes needs to convert some text file to another
- one (for example AmigaGuide to plain text or stripping of ANSI-sequences...) and
- it usually results in writing of a small utility to do such a job. These
- utilities are very similar to each other and most of them contain the same
- routines for input and output buffering, because the buffered I/O provided by
- the dos is terribly slow. This is the reason why I decided to write TEC.
-
- TEC is a simple tool designed to simplify many text conversion tasks. It
- acts as a one-input and one-output state machine with one internal string
- register, therefore it would be better to use some other programs (awk) for
- field-oriented conversion.
-
- TEC requires OS 2.04 or higher and the ss.library.
-
- TEC is pure and can be made resident.
-
-
- Invocation:
- -----------
-
- TEC may be started only from the CLI and has the following parameters:
-
- FILTER/M/A - a list of filter programs to be applied. If any of them is
- enclosed in single quotes, it will be interpreted as an one-line program. If
- it's a file name, default extension '.tec' will be appended. The input file
- will be processed by the first filter and passed as an input of the second
- filter ... the output of the last filter will be written to the output file.
- In many cases, single filter is enough to do the job.
-
- FROM/K, TO/K - names of source and destination file. If they're
- omitted, standard input/output is used instead.
-
- BUF/K/N - buffer size in bytes. Default=16384 bytes. Minimally 16 bytes.
-
- The conversion may be stopped in any time by pressing CTRL-C or by sending
- the break-C signal to it.
-
-
- The language:
- -------------
-
- There are listed the basic elements of the language:
-
- - comments - everything from '%' to end of line is ignored (only when the
- percent sign is not a part of character or string constant)
- - separators - semicolon and the end of line character
- - characters - (a) specified by decimal code
- (b) specified by hexadecimal code (preceeded by $)
- (c) character constant (enclosed in single quotes). It may
- contain escape characters (see below).
- - keywords - COPY,MSG,PUT,EOF,STOP,FAIL,NOCASE,ELSE,CLR,ADD,PUTS,CAT,SWITCH,
- CSWITCH,USE,GLOBAL,CASE,BACK.
- - state names - sequences of letters, digits and underscores, which don't
- start with a digit.
- - strings - sequences of any characters (including escapes) enclosed in double
- quotes. They cannot exceed one line unless the end of line is immediately
- preceeded by backslash (ignored escape sequence).
- - escape sequences - beginning with backslash
-
- \t - tab \n - newline \\ - backslash
- \' - ' \r - return \e - escape (char #27)
- \" - "
-
- in addition to these rules, backslash followed immediately by newline is
- ignored allowing long commands to be split to more lines of source text.
-
- The language itself is not case-sensitive, but the rules written in it
- usually are.
-
-
- Program:
- --------
-
- Each program consists of so called states. The interpreter can be in exactly
- one state at the moment. The conversion is started in the first state regardless
- to its name.
-
- It isn't necessary to specify a name of the first state unless there is
- a GLOBAL before it (see somewhere else what the GLOBAL is).
-
- 'Simple' state definition:
- --------------------------
-
- <state name>: <commands>
-
- When this type of state is entered, the <commands> are executed and
- the program is stopped unless the command sequence ends with name of another
- state to continue the program by. The commands may be separated by semicolons or
- newlines, but it doesn't affect their execution in any way.
-
- 'Complex' state definition:
- ---------------------------
-
- <state name>: [<init commands>] [USE <state>] {<charlist> <commands> <sep>}
- [ELSE <commands>]
-
- In this case, the <init commands> are executed, then one character from input
- is read and the interpreter finds corresponding <commands> for such a character.
- If there exist no <charlist> containing recent input character, the <commands>
- after ELSE are executed. Init commands may be separated by <sep>, but it doesn't
- affect their execution in any way.
-
- <charlist> - a list of characters separated by white spaces. May contain the
- EOF keyword which equals to END OF FILE condition.
-
- <commands> - any command list (see below). If it is not specified, the input
- character is thrown away. If no next state is specified, the current state is
- used again. These rules have exception: The default action for EOF is the STOP
- command causing immediate stopping of execution.
-
- <sep> - separator - semicolon or end of line
-
- The ELSE part may be omitted - automatically replaced by ELSE COPY (the
- current character is copied to output stream without any changes).
-
- The USE keyword causes the current character conditions to be derived from
- given state, which may contain ONLY the character conditions (it means NO
- initial commands). Warning: the ELSE <commands> phrase has _no_effect_, because
- all character conditions are set from the state we derive from. Another warning:
- If there are some conditions with no next state (using the same state as the
- next one), they will contain the original state as destination (they got fixed
- before the USE command). Therefore
- alpha: '1' '2' put '@'
- gamma: use alpha '4' put '!' else put '>'
- is equivalent to:
- alpha: '1' '2' put '@'
- gamma: '1' '2' put '@' alpha ; '4' put '!' gamma ; else copy alpha
- As you can see above, the USE command has very limited use and will be
- probably improved in future releases.
-
- Command lists:
- --------------
-
- {<command>} [<state name>]
-
- It means that the commands have to be executed in their natural order and
- then the converter has to continue with <state name>, if there's any.
-
-
- Basic commands:
- ---------------
-
- COPY - copy recently read character into output stream
- PUT <character> - copy given character into output stream
- > <character> - synonym to PUT
- STOP - stop conversion
- FAIL - stop conversion and exit with RC=10
- CLR - clear contents of the string buffer
- ADD - add recently read character at the end of the string buffer (maximally
- 255 characters)
- PUTS - put contents of string buffer into output stream
- BACK - push recently read character back to the input stream. You may do it
- only ONCE unless the character is read again.
-
-
- Other commands:
- ---------------
-
- PUT <string> - copy string into output stream
- CAT <string> - add string at the end of the string buffer
- MSG <string> - copy string into standard output
-
-
- Switching:
- ----------
-
- You may test contents of the string buffer by the SWITCH and CSWITCH. These
- tests may appear only in <init commands> of a state. The only difference between
- SWITCH and CSWITCH is that CSWITCH is case-sensitive.
-
- SWITCH {<string> <state>} ELSE <the command list continues here>
-
- The interpreter compares current contents of string buffer with the strings
- in the SWITCH command (in first-to-last order). Then it goes to <state> defined
- for first string which is equal to the buffer. If no string matches the buffer,
- the execution continues after the ELSE keyword.
-
- For example: CSWITCH "aaa" aaa ; "bbb" bbb ; "aaa" ccc ELSE STOP never calls
- the state ccc.
-
-
- Global definitions:
- -------------------
-
- It's possible to write some conditions affecting read characters in ALL
- states (but these conditions may be overriden in some states by simple
- redefinition of them). These global conditions are called GLOBALs and are
- defined before the first state of the program.
-
- GLOBAL {<charlist> [<basic command>] [<state>]} [ELSE <basic command>
- <state>]
-
- There are two differences between standard state definitions and the GLOBALs:
- (1) The GLOBALs can contain only basic commands (not the more complex ones).
- (2) The references to current state are not fixed immediately, so you can
- say that you want to convert each 'A' into 'a' without changing current state.
- This is the reason why GLOBALs are usually more preferable than USEs.
-
-
- Case senstivity:
- ----------------
-
- All character comparisons are case-sensitive.
-
- If you say NOCASE before <charlist>, each letter in each <charlist> in
- current state (unless you say CASE) will be automatically converted to both
- cases ('a' becomes 'a' 'A').
-
-
- Final words:
- ------------
-
- Send bug reports, comments and nice conversion tables to
- mjsoft@k332.feld.cvut.cz.
-
- This language is very simple and probably doesn't allow all things you need.
- The command set will be extended in some future version (if I will have some
- free time to do it), numeric registers and more string registers will be added,
- USE mechanism will be extended to be slightly more user-friendly and ... (mail
- me what would you like to be added).
-
- Thanks to Short Software for some good ideas.
-